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In multiprocessor systems, various problems were treated with Lamport's logical clock and the 
resultant logical time orders between operations. However, one often needs to face the high 
complexities caused by the lack of logical time order information in practice. 

In this paper, the so-called physical time order is proposed based on the global clock in multi- 
processor systems. Concretely, we first utilize the global clock to infuse the pending period to each 
operation in a multiprocessor system, where the pending period is a time interval in which the 
operation starts and ends. Afterwards, we define the physical time order for any pair of operations 
with disjoint pending periods. The physical time order is an underlying characteristic of any real 
execution in multiprocessor systems due to that it is part of the truly-happened orders obeying 
real physical time. Formally, the physical time order is proven to be independent and consistent 
with traditional logical time orders. 

The above novel yet fundamental concepts enables new effective approaches for analyzing mul- 
tiprocessor systems, which are named pending period analysis as a whole. As a consequence of 
pending period analysis, many important problems of multiprocessor systems can be tackled effec- 
tively As a significant application example, complete memory consistency verification, which was 
known as an NP-hard problem, can now be solved with the complexity of O(n^Cp) by utilizing 
physical time order information (where n and p are the number of operations and processors re- 
spectively, C is some constant). Moreover, two event ordering problems, which were proven to be 
Co-NP-Hard and NP-hard respectively, can both be solved with the time complexity of 0{nC^p) 
if restricted by pending period information. 

Categories and Subject Descriptors: C.1.4 [Parallel Architectures]: General: F.1.2 [Modes of Computation]: 
Pai'allelism and concurrency 

General Terms: Parallel, order, clock, multiprocessor system, memory consistency 

Additional Key Words and Phrases: Physical time order, verification, physical time order, pending 

period 



1. INTRODUCTION 

1 .1 Motivation and Related Work 

Many theoretical investigations of multiprocessor systems treat processors as distributed 
spatially. In these investigations, Lamport's logical clock [27], which is known as a cor- 
nerstone in the parallel and distributed computing areas, is often utilized to partially order 
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operations in multiprocessor systems. Given a pair of operations obeying some logical 
time order obtained by logical clock (such as processor order, execution order and so on), 
the logically later operation should observe the effort of the logically earlier operation. 
Once given the logical time orders between all pairs of conflicting operations in a system, 
the final execution result of the system has been determined. Thus the logical order infor- 
mation, if perfect, can reveal some intrinsic features of parallel and distributed computing 
without a global lock which was considered hard-to-achieve. However, in real multipro- 
cessor systems, it is often the case that the logical order information is far from perfect. 
That is because observing the logical order information of all operations, especially the 
low-level operations such as load, store, and synchronization instructions, is often imprac- 
tical in real systems. If purely replying on fragmentary logical time order information, 
one have to infer or conjecture the orders between a large number of conflicting operations 
[35; 13]. As a consequence, many apphcation problems in multiprocessor systems (e.g., 
memory consistency verification [12; 13; 14], event ordering [35], and so on) suffer from 
the resultant high computational costs, and thus are hard to solve. 

During the past decade of years, driving by the development of integrated circuit pro- 
cess, SMP (Symmetric Multi Processors) and CMP (Chip Multi Processor) techniques, the 
density of computing capacity of multiprocessor systems is fast increasing. The resultant 
scaling down of multiprocessor systems rewakes the intuitive idea of utilizing global clock, 
and a number of investigations with the consideration of global clock have been proposed. 
Herlihy and Wing [19] proposed the concept of linearizability, which requires the accesses 
to the same memory location happening in disjoint time intervals with respect to a global 
clock, as a correctness condition of memory system. In [42], Singla et al. proposed a 
temporal memory model "delta consistency" to offer time window to coalesce write op- 
erations to the same memory location. In [38; 43], the global counters, which implicitly 
represent the global time, were employed to reason about the ordering of transactions in 
transactional memory [20]. The common idea behind the above investigations is to obtain 
logical order information (especially execution order about the same memory location) by 
explicitly or implicitly employing a global clock. 

Nevertheless, the implication of a global clock is far more than providing some com- 
plementary logical order information. A notable fact is that, in a multiprocessor system, 
the global clock actually provides a physical-time-based partial ordering containing all 
happened operations in the system, and the ordering is "nearly" a total order: Two opera- 
tions are not ordered by this ordering if and only if their precise performed times (the time 
when an operation is observed by all processors) on the global clock are exactly the same. 
The complementary logical order information obtained through the global clock is only 
a part of the above partial ordering, since the logical orders concern merely operations 
on the same processor or accessing the same location. In fact, the extra order informa- 
tion obtained through the global clock, together with other logical order information, can 
produce a transition closure that further extends the acquirable order information. To our 
best knowledge, few investigation has concerned the above extra order information. Such 
neglect is probably due to the traditional view that "logical time orders are enough to de- 
termine the result of an execution, thus concerning extra order information , which does 
not change the result of execution, is not necessary". However, in this paper, it is discov- 
ered for the first time that the extra order information is powerful for simplifying many 
problems in multiprocessor systems. 

PREPRINT. July 12, 2009. 



Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems • 3 



Although we have mentioned the motivation of the paper above, to make better use of 
global clock in multiprocessor systems, we will still encounter three concrete difficulties 
in practice: First, it is hard to obtain the precise performed time of an operation, even 
there is a precise physical global clock. The reason is that the exact performed time of 
an operation is correlated with the hard-to-observe inner states of all processors and the 
network of the multiprocessor system (e.g., whether an invalidation message has arrived at 
some processor). Second, even if one makes some compromise and utilizes a time interval 
which includes the precise performed time, instead of the precise performed time itself, 
one must face new problems, e.g., how to deal with the overlapped time intervals of logi- 
cally ordered operations (noting that most modern processors can execute multi operations 
overlappedly)? Third, it may be difficult to observe and record the global time information 
for all operations, since in a modern multiprocessor system there are too many operations 
performed in every second. In this paper, these difficulties are tackled effectively and effi- 
ciently by our approaches. 

1.2 Our Contributions 

Given a global clock of a multiprocessor system in which the effect of an operation can 
always be globally observed in bounded time', let us consider two bounding time points 
for the performed time of each operation, named the start time and the end time. The re- 
sultant time interval from the start time to end time, which includes the performed time, 
is called the pending period. As a relaxation of the performed time, the pending period 
is easier to obtain in comparison with the precise performed time, which will be shown in 
Section 3.1 in detail. It is worth noting that the concept of pending period is pervasive: 
The pending periods of two operations in the same processor can be overlapped, which 
enables the instruction-level parallelism (ILP) [25] adopted by most modern processors; 
the pending periods of two operations accessing a same memory location can also be over- 
lapped, which leaves a space of overlapping memory accesses for efficiency optimization. 
Thus the concept of pending period can be adopted in most memory models, from strong 
memory consistency models (such as linearizability [19], sequential consistency [28; 40]), 
to weak memory consistency models (such as weak consistency [10], release consistency 
[26]). 

On the other hand, once any two operations have disjoint pending periods (even they 
relate to neither the same processor nor the same memory location), there is a physical 
time order between the two operations. In Section 2.2, it is proven that physical time 
order is independent and consistent with existing logical time orders. This is not surprising 
since the physical time order is part of a truly-happened physical-time-based ordering with 
respect to the global clock. Thus the physical time order is a natural order which must be 
obeyed by any real execution in multiprocessor systems. 

On the basis of global clock, pending period and physical time order, we introduce some 
effective approaches for tackling problems in the context of pending period, which are 
named pending period analysis as a whole. These approaches attempt to utilize the infor- 
mation brought by global clock to the full extent, involving but not limited to the traditional 
logical time order information. The first approach, assignment analysis, aims at assigning 
values for pending periods of all operations when the pending periods of only part of op- 



^For the sake of brevity, when we ai'e talking about a multiprocessor system, we imply that the precondition does 
hold. 
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erations can be observed directly. This approach can effectively handle the difficulty of 
observing and recording an immoderate amount of operations by inferring pending peri- 
ods for part of operations. The second approach called frontier analysis distinguishes the 
operations with overlapped pending periods from those with disjoint pending periods, and 
manage to prune the frontier graph [13]. As a consequence, the maximal number of nodes 
in a frontier graph is significantly reduced from 0{{n/pY) to 0{nC^), while the maximal 
number of edges is reduced from 0((n/p)P+^) to 0{nCPp). Noting that many problems 
in multiprocessor systems can come down to graphic problems on the frontier graph, fron- 
tier analysis may be applicable to reduce the time complexities of these problems. The 
third approach, order analysis, further explores orders beyond the physical time order. The 
order analysis aims at characterizing the so called time global order, which is the transi- 
tion closure of the physical and logical time orders. A result of order analysis is that any 
cycle, in the TGO execution graph representing the time global orders in a system, can be 
localized to involve merely 0{p) operations. This conclusion is important to guarantee the 
correctness of a multiprocessor system. 

One established example for validating the effectiveness of physical time order and 
pending period analysis is the well-known memory consistency verification problem, which 
was known as an NP-hard problem [12; 13]. With the concept and approaches developed 
in this paper, the problem can be solved with the time complexity of 0{n^CPp) in the con- 
text of pending period. This method has been employed in validation of an industrial CMP 
[6; 23]. Additional examples are the event ordering problems which investigate the pos- 
sible orders between pairs of operations. The investigated problems have been proven to 
be co-NP and NP respectively [35]. However, if these problems are restricted by physical 
time order, then they can be solved with the time complexity of 0{nCPp). The successful 
applications of our approach demonstrates that the global clock, physical time order and 
pending period analysis are effective and efficient in tackling various problems in multi- 
processor systems, especially those problems relating to frontier graph. 

The contributions of this paper can be summarized as follows: First, we provide a novel 
notion for global clock in multiprocessor systems, showing that the physical-time-based 
partial ordering information exported from the global clock, though has been neglected in 
some sense, is very powerful in tackling many problems in multiprocessor systems. Sec- 
ond, the proposed physical time order has established a novel, natural and fundamental 
concept for multiprocessor systems, which is independent but consistent with the tradi- 
tional logical time orders. Third, a set of approaches, called pending period analysis as a 
whole, are developed and have been successfully used in two types of well-known applica- 
tion problems in multiprocessor systems, one of which has been employed in industry. The 
resultant new solutions for the application problems have made significant improvements 
in comparison with the previous results. 

The terminology used in the rest of this paper is introduced in Table I. The rest of the 
paper is organized as follows: Section 2 provides the definition of physical time order in 
multiprocessor systems. Section 3 presents the approaches belonging to pending period 
analysis. Section 4 introduces two related appUcations. Section 5 concludes the whole 
paper. 
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s 


Muhiprocessor system 


u, V (with a subscript) 


Operation 


w (with a subscript) 


Write operation 


r (with a subscript) 


Read operation 


O (with a subscript) 


Set of operations 


V (with a subscript) 


Process / processor 


P (with a subscript) 


Path on a graph 


P 


Set of paths on a graph 




Set of cycles on a graph 


C 


Cycle on a graph 


f 


Frontier 


n 


Number of operations 


P 


Number of processes (processors) 


C 


Constant 


E 


Execution order 


P 


Program order 


PO 


Processor order 


GO 


Global order 


T 


Time order 


TGO 


Time global order 



Table I. Notations in this paper. 



2. PHYSICAL TIME ORDER IN MULTIPROCESSOR SYSTEMS 

Lamport's logical clock [27] enables partially ordering the events (operations) occurring 
at different processes (processors) of a distributed system. During the past thirty years, 
his logical "happened-before" order has generated different types of logical time orders 
for multiprocessor systems, such as processor order, execution order and so on. Briefly, if 
we are able to know all the "happened-before" orders (i.e., logical time orders) between 
operations, the execution of these operations, which can be represented by a DAG (Directed 
Acyclic Graph) named execution graph, is then determined. However, in practice it is 
often the case that we can only acquire part of the logical time orders between operations. 
Hence, if purely replying on the logical clock, one may need to infer and conjecture the 
orders between pairs of operations, which may be inherent with enormous search spaces. 
As a result, there might be an intractably large number of candidate executions that do not 
violate our obtainable information, which may be not only incorrect but also difficult to 
distinguish the incorrectness. 

An intuitive idea of tackling the above situation is to rewake the idea of global clock. 
Probably, the simplest way of utilizing global clock is to obtain the precise performed time 
of all operations. However, due to that observing the precise performed time of one opera- 
tion may involve many components in the system, this ideal approach is rather impractical. 
An alternative choice is to relax the precise performed time to the physical time interval 
(i.e., the pending period) which includes the performed time. In this way, although the 
compromise may eliminate some order information implied by global clock, partially allo- 
cating and utilizing the natural physical "happened-before" order implied by the physical 
time become possible. Such an order is called physical time order. 
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In this section, we will first provide a brief introduction to traditional logical time orders. 
After that, we will provide more detailed explanations for pending period and physical 
time order, including the corresponding definitions. Finally, the relationship between the 
physical and logical time orders is studied theoretically. 

2.1 Logical Time Orders in Multiprocessor Systems 

In this subsection, we briefly introduce some traditional logical time orders in multiproces- 
sor systems, which are based on logical clocks. Concretely, let us review the definitions of 
program order, processor order and execution order, which are three well-known types of 
logical time orders in multiprocessor systems. 

Definition 2. 1 (Program Order). Given two different operations ui and U2 in the same 

processor, we say that ui is before U2 in program order iff ui is before U2 in the program. 

p 

We denote this as ui — > U2- 

Definition 2.2 (Processor Order). Given two different operations ui and U2 in the same 
processor, we say that ui is before U2 in processor order iff there is global agreement that 

PO 

Ui is before U2 for all processors. We denote this as ui > U2. 

In multiprocessor systems, two operations are conflicting if they access the same mem- 
ory location and at least one of them is a store operation [41]. The execution order specifies 
the order between two consequent conflicting operations [21]. 

Definition 2.3 (Execution Order). We say that a write operation w is before operation 
u in execution order iff w is the latest write operation before u that accesses the same 

memory location as u. We denote this asw u. We say that a write operation w is after 
operation u in execution order iff w is the first write operation after u that accesses the 
same memory location as u. We denote this as u w. 

In addition, the transitive closure of processor and execution orders is known as the 
global order: 

Definition 2.4 ( Global Order). We say that operation ui is before operation U2 in global 
order iff ui is before U2 in processor order, or ui is before U2 in execution order, or ui is 
before some operation u in global order and u is before 112 in global order Formally, 

, GO ^ PO E N,wn GO GO 

[Ul > U2) {[Ui > U2) V (Ml — > U2) V [du eO -.Ui > U > U2)). 

So far we have already introduced a number of traditional logical time orders in multi- 
processor systems. All the above orders are based on Lamport's "happened-before" logical 
relation [27]: the logically former operation is before the logically latter operation in log- 
ical time order In practice, it is often the case that only parts of those relations can be 
observed directly, especially the execution orders [17]. For example, to observe the write- 
write execution orders, one may add specific hardware to cache coherence maintainer [34]. 
Even if we can infer some hard-to-observe orders based on known orders, the number of 
candidate executions for parallel programs may still be intractably large, since the case in 
which we can infer all logical time orders is rare. To cope with this problem by exploit- 
ing more information about the relations between operations, we propose the so-called 
physical time order in the next subsection. 
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2.2 Physical Time Order 

In a von Neumann architecture, an operation must be fetched into the processor before it is 
executed, hence there is a start time for any operation. The above fact impUes that, before 
an operation starts, it cannot affect other operations. On the other hand, an operation can 
be globally observed before it ends at some bounded time point, given some appropriate 
hardware (e.g., store atomicity [1]) or supplementary software (e.g., broadcasting). There- 
fore, given the global clock, we can assign a start time and an end time to each operation 
in a multiprocessor system. Compared with the performed time which involves the states 
of all processors, all caches and the network in the whole multiprocessor system, the start 
time and the end time are easier to obtain because of their locality and flexibility. 

Based on the start time and the end time of an operation, we provide the following 
definition of pending period. 

Definition 2.5 (Pending Period). The pending period of u is the period from ts{u) to 
te{u). We say that an operation v is in the pending period of operation u iff the pending 
periods of the two operations are overlapped. 

The concrete methods of observing start and end times are architecture-dependent. Hence, 
there are different observations and implementations for pending period and the related 
supports. However, the following essential idea of pending period is invariable: The per- 
formed time of an operation -i.e., the time when the operation is perfonned globally- must 
be in its pending period. Therefore, regardless of the concrete definitions of start time 
and end time, a partial order exists between two operations executing in disjoint pending 
periods. We call the partial order physical time order. 

Definition 2.6 (Physical Time Order). Given two operations u and v, if the following 

(1) the performed times of u and v are in their pending periods respectively; 

(2) the end time of operation u is before the start time of operation v 

both hold, then we say that u is before v in physical time order Formally, 

{ts{u) < tpl^u) < Uiu)) A {t,iv) < tp{v) < te{v)) A (<e(u) < t,{v)) ^{u^ v). (!) 
As illustrated in Figure 2.2a, the pending periods of u and v are disjoint, thus there is 

T 

physical time order between u and v, i.e., u — > v.ln Figure 2.2b, the pending periods of u 
and V are overlapped. Hence, either the performed time of v is before the performed time 
u, or the performed time of u is before the performed time v are possible. Therefore, in 
such a case there is no physical time order between u and v. That is, if operation v is in 
the pending period of operation u, then — > v — > u) holds according to the above 
definition. 

Notably, the physical time order between u and v does not require that the two operations 
are executing in the same processor or accessing the same memory location. Instead, it 
simply depends on the pending periods of u and v resulted from the physical time given by 
the global clock. 

2.3 Relationship between Physical and Logical Time Orders 

In this subsection, we discuss the relationship between physical time order and traditional 
logical time orders. Fundamentally, a remarkable difference between the former and the 
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t,(u) te(u) t,(v) tjv) 

A i i L 

(a) disjoint pending periods of u and v 



ts(u) t,(v) te(u) te(v) 

(b) overiapped pending periods of u and v 



Fig. 1. Time points of operation u and v on time axis (left side is eariier), wliere <s( ) and te(') represent start 
time and end time respectively. 

latter is that, physical time order is based on physical global clock while logical time orders 
are based on logical clock. Due to the above difference, it is intuitive that physical time 
order is independent with logical time order: 

THEOREM 2.7 Time Order Independency Theorem. The physical time order and 
logical time order are independent with each other 

Proof. First, let us consider a multiprocessor system in which the start times and the end 
times of all operations are and oo respectively. In such a system, there is no physical 
time order between any pair of operations. However, there can be some logical time order 
between operations. Therefore, the physical time order does not contain any logical time 
order 

Second, let us consider a multiprocessor system in which all operations have nontrivial 
assignments of pending periods, and access distinct memory locations. In such a system, 
there is no logical time order between any two operations in different processors, while 
there can be some physical time order between operations in different processors. There- 
fore, any logical time order does not contain the physical time order. □ 

Another issue concerns the following question: Whether the physical time order and 
logical time order contradict with each other? As we know, the physical time order between 
two operations implies that one operation physically happens before the other operation, 
while the logical time order between two operations implies that one operation logically 
happens before the other operation. Intuitively, if there is no bug in the multiprocessor 
system, the logical clock should comply with the physical global clock. 

To carry out the investigation, we need a common and natural criteria for comparing 
physical time order with logical time orders. A solution is to use the physical time points 
given by global clock to characterize all the above orders, since physical time order is based 
on the start and end time points, and logical time order can be represented as the relation 
between the performed time points. Following this idea, we list all formulae Unking the 
orders to physical time points as follows: 

(1) The performed time of operation u is between the start time and the end time of u: 

ts{u) < tp{u) < te{u); 
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(2) If u is before v in physical time order, then the end time of u is before the start time of 

v: 

(3) If u is before v in processor order, then the performed time of u is before the performed 
time of v: 

{u ^ v) {tp{u) < tp{v)); 

(4) If u is before v in execution order, then the performed time of u is before the performed 
time of v: 

{u ^v) ^ {tp{u) < tp{v)); 

Based on the above formulae, we can prove the following theorem. 

THEOREM 2.8 Time Order Consistency Theorem. The physical time order is con- 
sistent with the global order, i.e., the transition cloture of the processor and execution 
orders. Formally, 

{v ^ u) ^ -^{u v). (2) 

GO 

Proof. According to Definition 2.6, the above theorem is equivalent to {u > v) 

(ts{u) < te{v)). Since (u v) —>■ (tp{u) < tp{v)) and (u ^ v) (^p(^) < tp{v)) 

both hold, by transitivity of partial order we obtain that {u v) {tp{u) < tp{v)^. 

Since ts{u) < tp(u) and tp{u) < t^iu), {u v) {ts{u) < te(w)) holds. □ 

The independency and consistency between the physical time order and traditional log- 
ical time orders demonstrate that the former is novel yet natural. In the rest part of the 
paper, we will show that physical time order, together with our forthcoming approaches 
named pending period analysis, are powerful for tackling many problems in multiproces- 
sor systems. 

3. PENDING PERIOD ANALYSIS 

So far we have introduced the concepts of pending period and physical time order in the 
context of global clock. In this section, three concrete approaches based on the aforemen- 
tioned concepts are proposed to analyze the problems in multiprocessor systems. These 
approaches are named pending period analysis as a whole, since they concentrate on dif- 
feren aspects of pending periods. 

The first approach assignment analysis aims at obtaining the pending periods of opera- 
tions. As we have mentioned, the physical time order does exist between two operations 
only if the pending periods of the two operations are disjoint, thus knowing the pending 
periods of operations is essential to obtaining the potential physical time order between 
operations. Moreover, as a new restriction to multiprocessor systems, pending period may 
forbid many candidate executions of parallel programs which violate the ordering imposed 
by a global clock. Accordingly, the second approach, named frontier analysis, aims at 
pruning the search space of candidate executions in the context of physical time orders 
and pending periods. Finally, suppose we have already known the pending periods of all 
operations by some approach, then there are probably some operations between which no 
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physical time order holds in response to their overlapped pending periods. The third ap- 
proach, order analysis, aims at tackling the undiscovered order for the operations with 
overlapped pending periods. In this section, the above approaches will be introduced in 
detail. 

3.1 Assignment Analysis: Inferring Pending Periods 

To obtain all the physical time orders between operations, one must know the pending 
periods (determined by the start and end times) of all operations in a multiprocessor system. 
As we have mentioned in Section 2.2, there are different ways to observe pending period for 
facilitation in different systems. For example, some dedicated registers are added to each 
processor in Godson-3 [6] to observe the bounding time points of instructions (low-level 
operations), including program_counter/start_time pairs of the last started operation and 
program_counter/end_time pairs of the last ended operation [6]. Purely software method is 
also possible to observe time points especially for high-level operations, e.g., employing 
some global memory address as time counter. 

On the other hand, in many cases, it is hard to observe directly the pending periods of all 
operations if there are too many operations (especially low-level operations such as load, 
store, and synchronization instructions). Furthermore, even one can observe all the pending 
period information, there may also be difficult to record all pending period information in 
many systems if the speed of generating pending period information is faster than the speed 
of recording pending period information. 

To cope the difficulty of obtaining and recording pending period information, the as- 
signment analysis is proposed to make a compromise by observing the pending periods for 
only part of operations, and inferring the pending periods for the rest operations according 
to the following rule: 

Inferred Pending Period Rule^: In a multiprocessor system, given an operation set 
O ~ {ui , U2, . . . , and an observed operation subset Oobs ~ {uobsi , Uobs2 : ■ • • i Wohs„ } 
(1 < obsi < obs2 < ■ ■ ■ < obs,n < n), if the pending period [ts{uobsj),te{'Uobsj)] has 
been observed for each operation Uobs G Oobs^ then the pending period for any operation 
u € \ Oobs, denoted by [ts{u),te{u)], can be inferred as follows: 

t,,{u) ^ tsiprec^^^iu)) 

teiu) = te{sUCCobs{u)) 

where prec^^^{u) and sucCobs{u) are the last observed operation before u in processor 
order and the first observed operation after u in processor order respectively''. Formally, 

PO -h PO PO 

(Pfecobsiu) e Oobs) A (prec^^^{u) > u) A $j{prec^^^{iL) > Uobs, > u) 

PO -k PO PO 
{sUCCobs{u) e Oobs) A (li > SUCCobs{u)) A $j{u > Uobsj > SUCCobs{u)) 

We call a pending period of an operation is legal if it contains the performed time of the 
operations. It is not difficult to prove the legality brought by inferred pending period rule. 

^In some multiprocessor systems supporting some weaker consistency otlier tlian sequential consistency, there 
may be no total ordering for the operations on a single processor Therefore, the prec^f^^ or sucCabs of " rn^Y be 
not unique. However, arbitrary prec^f^^ and sucCobs for u are reasonable to provide a legal assignment of pending 
period to u. 

■^To avoid there is no prec^f^^ or sucCobs for any unobserved operation, we require that each operation without 
predecessor or successor operations in processor order should be observed. 

PREPRINT. July 12, 2009. 



Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems • 1 1 



THEOREM 3.1 Inferred Pending Period Theorem. If the observed pending pe- 
riod of each observed operation is legal, then the inferred pending period based on the 
inferred pending period rule is legal. 

Proof. Consider an operation u e O \ Oobs- If the inferred start time of u is 0, then 

ts(u) = < tp(u), else ts{u) = ts{prec^f,^{u)) < tp(prec ^^^{u)) < tp{u). Hence, 
ts{u) < tp{u) holds. If the inferred end time of u is oo, then tp{u) < ooie(u), else 
tp{u) < tp{sucCobs{'u)) < tf^{prec^j^g{u)) = te{u). Hence, tp{u) < tf.{u) holds. Since the 
performed time of u is in the inferred pending period of u, the inferred pending period is 
legal. □ 



ts(Ul) = 
ts(U2) = 
ts(U3) = 
ts(U4) = 
ts(U5) = 
ts(U6) = 
t,(U7) = 
t,(U8) = 100 



Ul 


te(U|) = 20 


Ut 


te(U2) = 200 


U3 


te(U3) = 200 


U4 


te(U4) = 200 


Us 


te(U5) = 200 


U6 


te(U6) = 200 


Uy 


t,(U7) = 200 


Us 


te(U8) = 200 



Fig. 2. Time points of operations U2 to My are inferred with u\ and Mg. 

As shown in Figure 2.2, with merely observed pending periods of ui and wg, the pending 
periods of 112 to u-j can be inferred: their start times are the same with the start time of ui, 
their end times are the same with the end time of ug. 

Considering n operations, with assignment analysis, one only need to observe directly 
the pending period of one operation out of every m operations (for example, one out of 
every 100 (i.e., m = 100) operations is observed directly in [6]), the pending periods of 
the rest n{m — l)/m operations can be then inferred. As a result, the inferred pending 
period would be looser than the observed pending period. However, this approach can 
reduce the difficulty of observing and recording pending period information by inferring 
pending periods. 

3.2 Frontier Analysis: Pruning Frontier Graph 

For multiprocessor systems, it is a natural problem to analyze the candidate executions of 
a parallel program. When the logical time order information is "perfect", i.e., the orders 
between any pairs of operation (if there are), are known to us, the above problem is trivial. 
However, since it is often the case that we can only observe and infer Umited logical time 
order information (especially execution order), we may need to consider a number of can- 
didate executions with respect to the same program, which do not conflict the obtainable 
order information. If the available order information is not enough, some execution may 
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be actually illegal (violating memory consistency or cache coherence) but it has been con- 
sidered as candidate executions. Unfortunately, distinguishing such an illegal execution 
from the legal one might be very time-consuming [12; 17]. In this section, we propose 
an effective and efficient approach, which is named frontier analysis, to prune the space 
of candidate execution for a parallel program. The frontier analysis approach is based on 
the aforementioned concept of physical time order and pending period. To present our ap- 
proach, we begin with Gibbons and Korach's notion of frontier graph [13]. Following the 
brief introduction to frontier graph, we then show how frontier graph is pruned in terms of 
frontier analysis. A corresponding complexity description will also be provided. 

Briefly, let us introduce Gibbons and Korach's concept of frontier graph: Given p pro- 
cesses 7^1 , ... , Vp, and p sets Oi, . . . ,Op containing all the operations executing at the 
above p processes respectively. A frontier /(ui, . . . , Up) is a tuple of p operations, where 
Vi e {1, . . . ,p}, Ui G Oi. Gibbons and Korach [13] further proposed the frontier graph 
consisting of all possible frontiers in the system and the directed edges connecting frontiers. 
It starts with an starting frontier consisting of p NULL operations, which represents the 
situation that no operation has executed at the very beginning. It ends with a terminating 
frontier consisting of the terminating operations at the p processes, which represents the sit- 
uation that all operations has begun at the very end. If a new operation u'^ £ Oi happens, the 
frontier /(ui, . . . , Ui, . . . , Up) will be updated to another frontier /'(ui, . . . , u,^, . . . , Up), 
and from / to /' there is a directed edge in the frontier graph. Therefore, each candidate 
execution can be mapped to a path from the starting frontier to the terminating frontier on 
the frontier graph. 

In multiprocessor systems, a frontier can demonstrate a snapshot of the executing op- 
erations in the p processors: the executing operations in the processors Pi, . . . ,Pp are 
ui,U2, ■ ■ ■ ,Up respectively. Intuitively, many important problems of multiprocessor sys- 
tems, which is inherent with search spaces of candidate executions, can be transformed 
to graphic problems related to the frontier graph, such as memory consistency verifica- 
tion and event ordering problems. The complexities of solving these problems directly 
relate to the size of frontier graph. Unfortunately, according to Gibbons and Korach [13], 
there are 0{{n/p)P) possible frontiers in total, which results in the intractability of many 
multiprocessor system problems. In [13], Gibbons and Korach proposed to use additional 
information such as read mapping (the mapping from every read to the write sourcing its 
value) and total write order (the write order for each memory location totally) to simplify 
the traverse on the frontier graph. Intuitively speaking, read mapping and total write or- 
der can reduce the number of possible edges connecting to each frontier by specifying the 
relations between write and read operations, and it can also reduce the number of reach- 
able frontiers. Consequently, the time for finding a path from the starting frontier to the 
terminating frontier on the frontier graph is also reduced. However, since read mapping 
and total write order may involve all processors in a system, observing and restricting read 
mapping and total write order (especially the latter) might be difficult in practice [17]. 

Instead of relying on some logical information about read mapping and total write or- 
der, we infuse the natural information including physical time orders and pending periods 
into the frontier graph, and manage to prune the frontier graph in frontier analysis. The 
idea of frontier analysis is quite straightforward: Since the operations in the same frontier 
are executing overlappedly, an operation in a frontier is in the pending periods of other 
operations belonging to the same frontier Meanwhile, the physical time orders eliminate 
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Fig. 3. The operations in tlie frontier involving operation U23 should be in the pending period of 1123. 



the frontiers which contain two or more operations with disjoint pending periods. As the 
example shown in Figure 3.2, the cloud represents the pending period of operation U23. 
Only the shadowed operations in the clouds can appear in the frontiers involving U23, since 
the operations out of the pending period of U23 cannot execute overlappedly with U23. 

With the above natural property, we can again figure out the number of feasible frontiers 
(in the context of physical time order) which contain a specific operation in ap-processor 
system. A feasible frontier containing Ui can be obtained by picking p — 1 operations 
from the operations whose pending period is overlapped with that of Ui, to fill them to 
the corresponding seats of the frontier. Moreover, since each operation in the systems 
discussed in this paper can be globally observed in bounded time, we let B be the upper 
bound of the lengths of all pending periods. Hence, in one processor, the maximal number 
of operations within any pending periods is no larger than cB (where c is a constant, and 
we let C = cB), which dose not rely on the number of operations n. Therefore, there 
can be at most C^^^ feasible frontiers involving Ui. Meanwhile, since we have at most n 
different choices when determining the specific operation ui, the total number of feasible 
frontiers in the frontier graph is then significantly reduced from 0{{n/p)P) to 0{nC^). 
Noting that the number of processors, p, is a constant for a given multiprocessor system, 
thus the number of feasible frontiers is actually 0{ri) for a given system. 

Further, the number of edges in the pruned frontier graph can also be calculated. As we 
know, there are p operations in a frontier, and we know that within the pending period of a 
specific operation there are at most Cp operations in the total p processors. Hence, for any 
frontier, there are at most Cp operations that can extend the frontier Hence, the number of 
outcoming edges from each frontier is bounded from above by Cp. Recall that the number 
of feasible frontiers is at most 0{nC'P^^), the total number of edges in the frontier graph 
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is 0{nCPp) in the context of pending periods and physical time orders. Similar to our 
discussions in the last paragraph, the above asymptotic order is in fact 0{n) for a given 
system. 

To sum up, with pending period information, frontier analysis enables one to deal with 
the pruned frontier graph, which has only linear numbers of nodes and edges with respect 
to the number of operations. Many problems relating to the frontier graph can thus be 
simplified. A crucial characteristic of frontier analysis is that it has successfully utilized 
the physical time orders, which may include some orders other than logical time orders, to 
localize the computation of many undiscovered orders. 

3.3 Order Analysis: A Technique Beyond PInysical Time Order 

As a new dimension of ordering relations in multiprocessor systems, the physical time 
order can order some operations that are concurrent from traditional points of view. How- 
ever, it is still problematic to say that two operations with neither physical nor logical time 
orders are concurrent. In this section, we present a new order that is beyond physical and 
logical time orders, and then study the related technique, order analysis, on the basis of the 
new order 

As we have mentioned in Section 2.2, the physical time order is independent and consis- 
tent with traditional logical time orders. Hence, it is a natural idea to consider the combi- 
nation of physical and logical time orders. Formally, the transition closure of the physical 
and logical time orders, which is now named time global order, is defined as follows: 

Definition 3.2 (Time Global Order). We say that operation ui is before operation U2 in 
time global order iff ui is before U2 in processor order, or ui is before U2 in execution 
order, or ui is before U2 in physical time order, or ui is before some operation u in time 
global order and u is before U2 in time global order. Formally, 

, TGO . (, PO E . , T . 

[Ui > U2) -» {{Ui > U2) V (Ml — ^ U2) V [Ui U2) 

,,/^ „ TGO TGO .\ 

V(du G C : ui > u > U2)) ■ 

Time global order is not the simple addition of physical and logical time orders. Due to 
the transitivity of partial order, two operations with neither physical time order nor logical 
time order may have certain time global order. 

Now let us come to a novel technique named order analysis, given the definition of 
time global order Order analysis exploits time global orders between operations, and 
utilizes them to check the correctness of an execution. As we know, the correctness of 
an execution in a multiprocessor system is equivalent to whether there is some cycle in 
the corresponding execution graph, where the execution graph is a DAG with its nodes 
representing operations and its directed edges representing the orders between operations 
(traditionally, these orders are processor order and execution order). Given the pending 
period information, a correct execution must further comply with the time global order. 
Hence, checking the correctness of an execution is then equivalent to finding a cycle in 
the corresponding execution graph, which contains edges in responses to processor order, 
execution order, and physical time order. We call this type of graph TGO execution graph 
[6]. 

Let be the set of all cycles including operation u in the TGO execution graph. Fur- 
thermore, let C be a cycle belonging to "if (C G ^^), such that u is an operation in C (u e C). 
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Intuitively, for any operation u, there are three kinds of cycles containing u: 1) all opera- 
tions of the cycle except u are not in the pending period of u; 2) some operations of the 
cycle are not in the pending period of u, while other operations are in the pending period 
of It; and 3) all operations of the cycle are in the pending period of u. 

Concerning the first kind of cycles which mainly involves operations outside the pending 
period of u, the following lemma proves that they can be reduced to specific local cycles 
involving two operations. 

Lemma 3.3. [6] Given a time global order cycle C containing operation u, if all op- 
erations in C except u are before u in physical time order, there must be a write operation 
w in cycle C, which is after u in execution order Formally, 

(Vw E C : {v ^ u) ^ [v u)) {3w ^ C : u ^ w). 

Proof. Given that operation v' is the successor of u in cycle C, there are three situations 

for us to consider: u — > v', u > v', and u — > v' . We know that all operations in C 

except u are before u in physical time order, therefore u — > v' does not hold. Since tp{v') 

is before tp{u), u v' does not hold. Hence, u v' holds. Furthermore, if v' is a read 
operation, v' certainly cannot get the value of u from the future, and u cannot be before v' 
in execution order. Therefore v' is a write operation. Thus the theorem is proved. □ 

As shown in Figure 4, the operations u, ui, U2 and consist a cycle in TGO graph, 
where ui, U2 and are before the pending period of u. Based on the previous lemma, the 

cycle can be reduced to a small cycle u ^ ^ u. 




Fig. 4. The operations u,ui,U2 and 113 consist a cycle in TGO graph, where ui,U2 and U3 are before the 
pending period of u (the clouds in the figure). 

Similarly, the second kind of cycles can be reduced to cycles involving merely an op- 
eration outside the pending period of u. Moreover, the third kind of cycles are obviously 
local cycle inside the pending period of u. Based on the above locality of cycles in TGO 
execution graph, we propose three correctness rules in Theorem 3.4 to check the acyclicity 
of a TGO execution graph locally. Each correctness rule is related to one kind of cycle 
mentioned above. 
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THEOREM 3.4 Checking Rules Theorem [6]. There is no cycle in the TGO exe- 
cution graph of the execution iff for any operation u of the execution, the following three 
correctness rules hold: 

Rulel: Vui £ O : {w u) ^ -i{u w); 
Rule2: Vw, v' ^ O : {v ^ u) ^ (v' ^ v) ^ -(it v')- 
Rule3: -^(bC £ : {Vv e C : ^{u ^ v V v ^ u)) 

Proof. We assume that there is no cycle in the TGO execution graph of the 

execution. For Rule 1, given that a write operation w satisfies w u, u ^ w does not 

T E 

hold. Otherwise there will be a cycle w — > u — > w. For Rule 2, if operations u, v and 

T GO GO 

v' satisfy {v — > u) A {v' > v), then u > v' does not hold, otherwise there will be a 

GO T GO 

cycle v' > V — > u > v'. Rule 3 is trivial: since there is no cycle in the whole graph, 

there certainly will be no cycle in the pending period of u. Hence is proved. 

We use reduction to absurdity to prove it. Let us assume that Rules 1, 2 and 3 all 
hold, but there is a cycle C. Let operation u be the last performed operation in the cycle. 
According to Rule 3, there must be some operation outside the pending period of u. We can 
travel C from u. Let v be the first operation before u in physical time order in traveling C. 

T 

Since u is the last committed operation in the cycle, u — > v cannot hold. Instead, we have 

T 

V — > u. If all operations except u in cycle C are before u in physical time order, according 

to Lemma 1, there must be some operation w such that u ^ w, which contradicts Rule 1. 
Hence, there must be some operation in the pending period of u. Let v' be the precedent 

TGO TGO 

operation of v in C. As shown in Figure 2, u > v' and v' > v. Let the edge 

T 

a — > 6 be the first physical time order edge on the path from u to u in C. According to the 
definition of physical time order, we obtain that tp{u) < tp{a) < tp{b) < tp{b). However, 
since u is the operation committed last in the cycle C, tp{u) cannot be before tp{h), and we 
reach a contradiction, there is no physical time order edge from u to tj in C. As a result, 
u -^^^ v' and v' -^^^ v hold. But u v' -^—>- v u contradicts Rule 2. Thus " is 
proved. □ 

In a real system. Rule 1 checks the incorrect performed time: the performed time of an 
operation is out of its pending period, thus its effect can not be observed by some operations 
with later and disjoint pending periods. To check Rule 1, one need to check whether the 
latest write before u in physical time order has propagated to u. Rule 2 focuses on ordering 
bugs between operations inside and outside of the pending period. To check Rule 2, one 
need to check all operations before u in global order to find cycles as shown in Figure 2. 
Rule 3 focuses on cycles inside the pending period. 

Theorem 3.4 not only shows how to check whether a TGO execution graph is acyclic, 
but also limits the complexity of checking. That is because checking one operation only 
involves at most Cp operations: Rule 1 involves only a constant number of operations, 
while checking Rules 2 and 3 we need to travel through all operations (the number is at 
most Cp, see also Section 3.2) in the pending period of u. As a consequence, checking the 
correctness of an execution is with the complexity of 0{nCp). 

Order analysis is beyond both physical and logical time orders, since it has employed 
the time global order. This approach is effective for tackling not only correct behaviors but 
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also error behaviors of multiprocessor systems (e.g., cycles in execution graph). 
4. APPLICATIONS 

In this section, we present two examples, memory consistency verification and event or- 
dering problems, to illustrate the significance of our proposed concepts and approaches. 

4.1 Memory Consistency Verification Problem 

In most modern multiprocessor systems, the memory subsystem employs complex hierar- 
chies with many hardware resources to support shared memory, which may contain bugs 
about memory consistency and cache coherence. Furthermore, parallel programs and com- 
pilers may also bring violations to memory consistency. Even subtle bug about memory 
consistency may lead to erratic behavior, which is difficult to find and debug. The mem- 
ory consistency verification problem aims at checking the execution of parallel program 
against given memory consistency, and has been widely concerned in both academic and 
industrial fields. Gibbons and Korach proved that the VSC problem (verifying sequential 
consistency) is NP-hard with respect to the number of memory operations [12]. Further, 
they also studied the complexity of the VSC problem with some additional information 
and constraints [13; 14], and their results include: l)With read mapping which maps every 
read to the write sourcing its value, the obtained VSC-read problem is still NP-complete. 
2)With total write order which orders the write operations for each memory location to- 
tally, the obtained VSC-write problem is also NP-complete. 3)With both read mapping and 
total write order, the obtained VSC-conflict problem belongs to P. However, read mapping 
imposes restrictions to the executing parallel program, while obtaining the total write order 
even requires adding specialized hardware [34; 9]. Some investigations manage to obtain 
a relatively low complexity at the cost of losing completeness (these methods still require 
read mapping information) [17; 32; 33; 39]. Nevertheless, without total write order infor- 
mation, even the method with least time complexity among these incomplete methods still 
require the time complexity 0{n^) [33]. Henzinger et al. [18] sought natural restriction 
on multiprocessor systems, which bounds the number of pending operations in the system, 
to enable formal verification of a high-level system description against memory consis- 
tency. However, the method aims at small and manually-constructed system models and is 
incapable to solve real design. 

In our previous work [6], we proposed a fast and complete memory consistency verifica- 
tion method. Our method requires neither read mapping nor total write order, which makes 
our method easy to generalize. This method only needs to observe the pending periods of 
part of operations periodically to assign a pending period to each operation using the as- 
signment analysis approach presented in Section 3.1. As shown in Table 1, the framework 
of the algorithm in our method is inherited from [13], which requires to traverse the frontier 
graph to find a path from the starting frontier to the terminating frontier At each frontier 
(node of the frontier graph), the checking_violence function checks cycle in the current 
execution graph: If any cycle is found in the current execution graph, the algorithm should 
backtrack to the previous frontier and select another branch to explore; if the current exe- 
cution graph is proven to be acyclic, the algorithm should move forward to a next frontier 
Each time moving forward in the frontier graph, some edges of execution order should be 
added to the current execution graph; each time backtracking in the frontier graph, some 
edges of execution order should be removed from the current execution graph. Once the 
current frontier / travels to the terminating frontier and no cycle is found, the execution 
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Algorithm 1: Algorithm Framework of Memory Consistency Verification 



INT MEMORY_CONSISTENCY_VERIFICATION() 



(1) / = starting .frontier; 

(2) while 1 do 

(3) if CHECKlNG_VlOLENCE(ciirrent_ea;ecutzon_grap/i) then 

(4) if / == starting. frontier then 

(5) return 0; 

(6) end if 

(7) else 

(8) REMOWE JiT>GE{current-execution.graph); 

(9) / = backtrack(/); 

(10) end else 

(11) end if 

(12) else 

(13) if / —~ terminating. frontier then 

(14) return 1; 

(15) end if 

(16) else 

(17) / = select_branch(/); 

(18) ADD JiDGE(current_execution.graphy, 

(19) end else 

(20) end else 

(21) end while 



is proven to comply with memory consistency model. If the current frontier / has back- 
tracked to the starting frontier and no other branch can be selected, the execution is proven 
to violate the memory consistency model. 

The complexity of memory consistency verification comes from the product of two as- 
pects. One aspect is the complexity of traversing the frontier graph, and the other aspect is 
the complexity of checking violence in the current execution graph at each frontier. Recall 
that with pending period information, the frontier analysis presented in Section 3.2 bounds 
the numbers of frontiers in the frontier graph to 0{nC^) from above, and the order analy- 
sis bounds the complexity of checking cycle in execution graph to 0{nCp), therefore the 
overall time complexity for complete memory consistency verification is only 0{n^C^p). 

It is worth noting that in error multiprocessor systems, there may be bugs of improper 
performed times for operations: the actual performed time of an operation may not been 
globally observed before the obtained end time of the operation. For example, in a directory 
based cache coherent system, a store may not be observed by other processors because 
of some error in the directory, thus its actual performed time is later than the obtained 
end time. However, although our memory consistency verification method expects that the 
obtained pending period of each operation should contain its actual performed time, it does 
not require the certified precondition that the obtained pending period of each operation 
contains its actual performed time (though we have defined ), since it can find violations 
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of both memory consistency model and improper performed time. 

On the basis of our theoretical investigations, we have implemented a memory consis- 
tency verification tool for CMP, which is named LCHECK [6]. LCHECK can verify a 
number of memory consistency models [44], including sequential consistency, processor 
consistency, weak consistency and release consistency. It has become an important verifi- 
cation method for the functional validation of an industrial CMP called Godson-3 [22; 23], 
and have found many bugs of the memory subsystem of Godson-3. 

4.2 Event Ordering Problems 

Event ordering is another interesting topic related to multiprocessor systems. In different 
candidate executions of one parallel program, two operations in the program may occur 
in different orders. The event ordering problems investigate the inevitability or possibility 
of the order between two operations. They are the theoretical foundations of many other 
important problems in multiprocessor system, such as replay of execution [24], debugging 
software [37; 31], intrusion detection [30], and so on. 

In [35], Netzer and Miller gave a formal analysis for the event ordering problems. They 
defined the happened-before, concurrent-with, and ordered-with relations, for operations. 
Each of the three ordering relations was further defined in two manners: must-have and 
could-have (similar with universal quantifier and existential quantifier in symbolic logic). 
The must-have sense requires that the ordering is guaranteed in all legal candidate exe- 
cutions of the program (with respect to the given memory consistency model), and the 
could-have sense requires that the ordering occurs in at least one legal candidate execu- 
tion of the program. Netzer and Miller found that it is co-NP-hard to prove any of the 
must-have ordering relations and it is NP-hard to prove any of the could-have ordering 
relations. For the sake of brevity, here we only discuss the must-have happened-before 
(MHB) and could-have happened-before (CHB) relations in detail, while the discussions 
of other relations are similar to the two examples. 

According to Gibbons and Korach [13], one path from the starting frontier to the termi- 
nating frontier on the frontier graph can represent a candidate execution of the program, the 
event ordering problems, which depend on the candidate executions, can be investigated 
on the frontier graph of the parallel program. Assume that operation u is an operation 
of the processor Vi, operation v is an operation of the processor Vj, and P is the set of 
all paths from the starting frontier to the terminating frontier on the frontier graph, then 
the must-have happened-before (MHB, denoted by '^HJIJI^"^ ^nd could-have happened- 
before (CHB, denoted by " '^^^) ") relations can be formalized as follows: 

u MEE, ^ (VP e P : (3/ e P : (P,(/) u) A [v ^ V.if)))); (3) 
u ^ ^ (3P G P : (3/ e P : {V,{f) ^ u) A {v ^ V,{f)))), (4) 

where Vi{f) is the operation of frontier / on processor Vi. Regarding the frontier graph, 
the must-have happened-before relation between u and v is that for each path from the 
starting frontier to the terminating frontier, there is a frontier / on the path whose operation 
on processor Vi is after u in processor order and whose operation on processor Vj is before 

V in processor order. Thus deciding the must-have happened-before relation between u and 

V is equivalent to a basic problem of the graph theory: to decide all paths (paths on frontier 
graph) from one node (the starting frontier) to another node (the terminating frontier) in 
a DAG (the frontier graph) must pass through a set of nodes (the set of frontiers which 
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imply u happens before v), which has time complexity of 0{n / + e/) [4] (where n f is the 
number of nodes in the frontier graph and e/ is the number of edges in the frontier graph). 
Therefore, the time complexity of the must-have happened-before problem is bounded 
from above by some linear function with respect to the numbers of nodes and edges in the 
frontier graph. 

Similarly, the could-have happened-before relation of operation between u and v u '^^^ ^ 
V is that there exists a path from the starting frontier to the terminating frontier on the 
frontier graph, which contains a frontier / whose operation on processor Vi is after u in 
processor order and whose operation on processor Vj is before v in processor order. It is 
equivalent to another basic problem of graph theory: to decide a path (a path in frontier 
graph) from one node (the starting frontier) to another node (the terminating frontier) in a 
DAG (the frontier graph) which passes through a set of nodes (the set of frontiers which 
imply u happens before v), which has time complexity of 0{nf + Cf) [4]. Hence, the 
complexity of the could-have happened-before problem is bounded from above by some 
linear function with respect to the numbers of nodes and edges in the frontier graph. 

From the above discussions, we can find that the complexities of the two event ordering 
problems directly relate to the numbers of nodes and edges in the frontier graph. Similar 
discussions can be generalized to other event ordering problems. Given the restrictions of 
pending periods and physical time orders, the complexities of event ordering problems can 
both be reduced to 0{nC^p), since both the nodes and the edges of frontier graph is no 
more than 0{nCPp) with the additional information. 

5. CONCLUSION 

In this paper, a novel perspective of utilizing global clock in multiprocessor systems is pre- 
sented, demonstrating that the implication of global clock, if being exploited sufficiently, 
can have significant influence on the design and analysis of multiprocessor systems. As we 
have pointed out in Section 1, a global clock in a multiprocessor system actually implies 
a physical-time-based partial ordering for all operations in the system. It is revealed by 
our investigations that the above partial ordering can not only export useful information 
for supplying logical time order information, but also provide natural constraints for local- 
izing the inference between operations. Such natural constraints are defined explicitly as 
a partial order named physical time order, which has been proven to be independent and 
consistent with traditional logical time orders. 

On the basis of the above views and concepts, we have proposed a number of approaches, 
which are named pending period analysis as a whole, focusing on different aspects of 
making our idea of utilizing global clock practical. These approaches, together with the 
definitions of pending period and physical time order, actually provide solutions for the 
difficulties mentioned at the end of Section 1.1. Concretely, the concept of pending period, 
which is actually a flexible relaxation of the precise performed time, has given a feasible 
solution for handling the hard-to-obtain precise performed time. Moreover, the frontier 
analysis presented in Section 3.2 has limited the complexity of conjecturing or inferring 
the ordering relations through pruning the space of candidate executions, and the order 
analysis presented in Section 3.3 further combines the physical and logical time orders 
to infer the ordering relations inside overlapped pending periods. Finally, the assignment 
analysis carried out in Section 3.1 demonstrates that, observing pending periods of only 
part of the operations is enough to obtain pending periods of all operations. 
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The pending period analysis has been adopted in two appUcation problems in multi- 
processor systems. One of these problems, complete memory consistency verification, is 
simplified from NP-hard to the time complexity of 0{n^CPp) with pending period anal- 
ysis. This fast and complete memory consistency verification method has been employed 
in industry. Moreover, the two event ordering problems, which were proven to be Co-NP- 
Hard and NP-hard respectively, can now be solved with the time complexity of 0{nC^p) 
if restricted by pending period information. It can be hoped that more problems in multi- 
processor systems can be facilitated by the view, concepts and approaches proposed in this 
paper. 
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